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ABSTRACT 


The Kolmogorov-Smirnov goodness-of-fit test is exact only 
when the hypothesized Geer hbULLOnets COMLinuous, but recently 
Conover has extended the Kolmogorov-Smirnov test to obtain a 
Moo that 1s exact in the case of discrete distributions. 
Reasons for using this procedure instead of the regular 
Kolmogorov-Smirnov test when the hypothesized distribution 
is discrete are given. A computer subroutine is developed 
to allow easy use of the procedure. The subroutine is then 
used to demonstrate the conservatism of the regular Kolmogorov- 
Smirnov test in this case and to investigate some properties 


of the asymptotic distributions of the test statistics. 
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eee RO mC CON 


Various statistical problems reduce to the choice of a 
pavcemermerc Tema ot a probability distribution of a population. 
A one sample goodness-of-fit test is a test of the hypothesis 
ie: Peay =~ see tor all <,. where F as the unknown cumulative 
distribution function of the population in question and H is 
the hypothesized cumulative distribution function. There are 
various test statistics that can be used in goodness-of-fit 
tests. The choice of which statistic to use depends on the 
Mateeewommicmcanple, whether F is continuous or discrete, 
whether all of the parameters of H are known or are estimated 
from the sample, or whether H is a member of a certain class 
of distributions. The two most commonly used tests are the 
Chi-square and Kolmogorov-Smirnov (K-S) type goodness-of-fit 
TESTS. 

The Chi-square test is based on a test Seavi sal Co uote as 
asymptotically distributed as a Chi-square random variable, 
and therefore is used when the sample size is relatively large. 
The Chi-square test does not require major assumptions on the 
hypothesized distribution and can be used when the parameters 
of the hypothesized distribution are estimated from the sample. 
The hypothesized distribution may be either discrete or contin- 


vous and the data may be observations of the population or 


grouped observations of the population. 





The Kolmogorov-Smirnov test statistic has a known distri- 
bution for all sample sizes which makes the test exacts, ‘Tine 
K-S test may be preferred to the Chi-square test when the Sample 
Size is small because of the exactness of the K-S test. There 
is some controversy as to which of the two tests is more power- 
ful. The relative power has been studied (see Massey, Le 
and the K-S test appears to be more powerful in some caseg 
while the Chi-square test is more powerful in others. MTradi- 
tionally, a major requirement for the K-S test has been that 
wae Nypethestzed distribution, H, must be continuous. If H 
is not continuous, then a test of the hypothesis ue using the 
Thad tlenalen—-5 tables 16 kmowm to be conservative (see Noether, 
£9_/). 

Unfortunately, the exact degree of conservatism is not 
known. W. J. Conover / 3_/ derived a method to use a K-S type 
test when the hypothesized distribution is discrete or when 
the data has already been grouped (see Darmosiswoys / 5_/); 
but the computations using this method are long and involved. 
In what follows, a program is developed to be used on a digital 
computer employing Conover's method. This program is then used 
to investigate the asymptotic distributions of the test statis- 
Les. 

A description of notation used herein is contained ial aealls 


following list: 





Nera on 


S 
n 


H 


pep A 


eb) Gr) 


H 
O 


H 


Dre Siere iy Saruehel 


Pig ical distri bublonm tUnetion of 4 
random sample of size on. 


sample size. 
Level of significance of test. 
Covcieal level of test. 


Uimemenwm Gicera bution function of 2 
random sample. 


ivpouresuced distribution Lune tion. 


Random sample of size n. 


Ordered rearrangement of the random 
sample Kyser eX, an accend ing wordc.. 


A null hypothesis in test hypotheses. 


An alternate hypothesis in test 
OO wie srs. 





II. DESCRIPTION OF CONOVER'S PROCEDURE 


A. KOLMOGOROV-SMIRNOV TYPE TESTS AND TEST STATISTICS 


One sample K-S type tests are goodness-of-fit tests that 
compare the empirical cumulative distribution function of a 
random sample to a hypothesized cumulative distribution 
Mae crOny lt sunewemprrical cumulative distribution function 
is not close, in the sup norm sense, to the hypothesized 
cumulative distribution function, then the conclusion is 
made that the random sample did not come from the hypothesized 
claitstazaW orb Grabtey gr 

Let X 


x ..,X, de independent random variables (obser- 


18 aa ae 
vations) each having the same unknown distribution F. If 
Keay Arayr + Any Pepresents the rearrangement of Xi sXoreaes 


a in asending order, then the empirical cumulative distri- 


poyton Lunction Sh is defined by: 


> (x) = k/n ae Key = XK)? kK = ey Seat alamo 


i alist ea) 


The K-S test may be used to test the three following hypotheses: 


nee tex) for all x 


O 
H,: Pies 7 Hix) for some’ x 

Ze Ho: Bey Hx) for all x 
Hy: F(x)<H(x) for some x 
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3, Ho: mee) =Htx) for all x 


Ha: F(x)>H(x) for some x 


IiMimecctieiypoulcsis, H ts a specified distribution function. 
One of the following test statistics is used depending on 
the hypotheses being tested: 


ieee = sup, JH(x)-S_ (x) | 
Dis sup, (H(x)-S_ (x) ) 
3. D'= sup, (S)(x)-H(x)) 


For each of the three hypotheses, a sufficiently large obser- 
vation of the test statistic indicates that the null hypothesis 
should be rejected. If a is the level of significance desired 
iieneretes ye Of Cither hypotheses 1, 2, or 3, then critical 
values c, c , or co" are determined as follows, according to 
which set of hypotheses is being tested: 

1. P(D=c) = a 


O 


Ze Ei be= Cc.) 


| 


I 


EP =) 


U 


"P" in the above equations is the measure associated with H. 
If the observation d, a7, d° of the statistics D, D, or D', 
respectively, exceeds the corresponding critical values, that 
null hypothesis is rejected at a level of significance of «a. 
Instead of determining the critical values, we may compute 
the critical level, oe which is the smallest significance 


level at which the null hypothesis would be rejected for the 


Ame 





given oseemvabion d, d , or a’, and compare it with «. If 
CL a ag, then the null hypothesis is rejected while if eh, 
the null hypothesis is not rejected. The two methods are 
equivalent and the level of significance in both is «a 

If H, is true and H is continuous, it is known (see 
Darling, / 4/7) that the distributions of D, D~, and D’ are 
independent of H. Tables of critical values for various 
levels of significance of the test statistics D, D , and oi 
are available for use in the K-S test when H is continuous. 
Moen Hh issdiserete, the distributions of D, D , and D’ are 
not independent of H and the standard K-S tables cannot be 
used to find the critical levels of the test statistics. When 
H is discrete, the standard K-S tables can be used to give an 
approximation of the level of significance of the test because 
of the following demonstration. Let Y be a discrete random 


Zwetaole With Glepeiouvion funetion R. If AysAoreas are 


points of discontinuity of R with associated probabilities 


Dy Doreees then, let Z be any continuous random variable with 
Gicterbuklon tunctrion © such that T(a,) - T(a,_1) = Dee sleet ke 
Zyeeey AL ls any point such that a= ay: Then 
= ; Lee 7 a 

R(a,) AN ele i ee aly) 
Let ee ee on be a random sample from R. This random 
sample can be thought of as having been determined by a random 
sample Ly sZorer eb, from T by setting ty =a. ileal as 4 ae 
Ass mel Cee Ke ole ee. ky Ns. LL as is the empirical 


lz 





Oe 


fiecsmenoubilon runctilon of YysYoree es ¥) and TY is the empirical 


teetitbutton TunctLon of ZyrGnr.e eZ, then 

R(a;) = T bas): ee) Ane ee) 
ie D = SUD, [R_(a)-R(a)| . since R is discrete, 

D' = sup, [R,(a,)-R(a,)| (3) 


Gn and’ (2) imply |R,(a,) = R(a,)| =|, (a,)- T(a,)| roe yiLal 


ml 2, gas . een, 
D’ = sup, )R(a;) - R(a, )| = sup; [T(ay) - T(a;)| = 
SUD, Ee) - T( a) = » 


which implies P(D’=c) =P(D =c) for any c. The same ar gu- 
ment can be used for D” and D’ to show that P(Dr = eS 
P(D =) and P(D) = ey = Pan = Cie Sihererore., apetie 
standard tables are used to construct a test when H is discrete, 
the test is conservative. 

Slakter / 10 / demonstrates the conservatism of the contin- 
uous K-S test when H is discrete using a computer simulation 
to calculate an estimate of the actual level of significance, 
ay» of the hypothesis H where H is the discrete uniform 
distribution with k mass points. Ten thousand random samples 
were generated from the hypothesized distribution and the 
statistic D was evaluated. a, Was then estimated as the 
peoportion om the ten thousand replications in which a was 
rejected. This process was repeated for various sample sizes 


and various k and in all cases A, Was considerably less than 


I 





the true a. For example, with k = 10, 50 observations, 
added = .05, a, turned out to be ~(Qllieyer 

The use of a conservative test might at first seem desir- 
Aelcwctice It 2Udratitees that the actual probability of 
rejecting the hypothesis when it is true is less than the 
predetermined probability of rejecting a roa hypothesis. 
Unfortunately, this causes a decrease in the power of the 
test. This unknown amount of decrease in the power of the 
test leads us to desire that we could calculate the exact 
Significance level of our test when H is discrete. 

Pameccmuiemdictri butions of D, D , and Dp’ depend on H it 
would require a prohibitive number of tables for use in 
testing He when H is discrete, even for simple distribution 
families. For this reason, the use of K-S tests when H is 
discrete has not been investigated until recently when W. Jd. 
Conover demonstrated a method for finding the exact critical 
level (approximate in the two-sided case) in this instance. 
The program presented in this thesis makes use of Conover's 


procedure a practical reality. 


B. CONOVER 'S PROCEDURE 
ioe vba bu bboas of Test Statistics 
Bonover derives the distribution of D, D , and D’ for 
H continuous or discontinuous in / 3_/. He shows that P(D = +) 


= i, = where the e,'s are defined recursively as follows: 


lage 


26 ieniemore kK =.2,. 3,0. ,nt L 
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aa 


k-1 k-j 
eet DD (sa) a OH) 


Jol 


° = -1| n-k+r1 — 
With f, = P{X,<H [a-ktt— - e)} ask net 5). 


The X,"s are the independent ldentically distributed random 
Weisiaples With distribution function F. H71(p) is defined as 
wp, x: lx) = p eeeeoOe =p = 1 and as) minus infinity as 
p=0. If H is continuous, then with the use of the proba- 
bility integral transform, it is easy to see that 

ie alge st - t and (4) reduces to the form of the regular 
K-S statistic obtained by Birnbaum and Tingey / 2_/. We note 


iti at) tie then trem (5), f.. = 0 and the distri-— 


k 
+ . 
bution of D becomes 


P(D = t) = o 


where m, is the greatest integer ten toe, Tier dasite ie 


(* Je; as (6) 
jou) 4 


bution of D is very similar to D' and is given by P(D = t) 


me Oa where the b,'s are defined recursively as follows: 
bs = [Vendor k= 2,3;.66,nrl 
k-1 
4 eo we (7) 
b, = ]l- Get j c. 
je 


with ¢c pe ee a} 12k Se ae 





Pee ole) 1, then aa t >1 in (8) which implies 


Cc, = 0 and the distribution of D becomes 


a n-jtl ( 
P(D = > n ee 9) 
( = 4) 2, (5°) d, Cc 

Jie 


P(D >+t) is approximated by P(D =t) = P(D’= t) + P(D-= t) 


madeuhne tollewine bounds for P(D = +t) are given: 
P(D = t) + P(D= +t) - P(D =t) P(D Sts 
P(D=t) = P(D’ = t) + P(D = t) (10) 
~- — 
In most tests, P(D =t) and P(D = t) are small and therefore, 


the maximum error in this approximation is very small. 


Peo cleo antonsor Critical Levels 


a. Critical Level for D™ 


Let d = Sue dco)! = S_(x)) be determined from 
wie ebeervations. For each k such that 1 =k ~n(i-d )+1, 
Geawea Nerizontval Line with ordinal value of a + d on 


comin. i. ede) sumless withem ime 


the graph of H. c (Aas 


k 


intersects H at a discontinuity in which case Cy is one minus 
the height of H at the top of the jump. fhe b's are then 
computed from (7), and (9) is used to compute the critical 
evel, bibs = d ). 
pemeecitueal Level for i 
Let d = sup, (S Cx) - H{x)) be determined from 
The observations. For each k such that l=k = n(1-d") ne 


; : . : ere > 
draw a horizontal line with ordinal value of 1 - (sa =a) 
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Onewuue £rapn of H. Lag is then this ordinal value unless the 
Pie wminterceevo the graph Of H at a discontinuity of H in 
which case fi is equal to the height of H at the bottom of 
the jump. The e,'s are computed using (4), and (6) is used 
to compute the critical level, P(D' => ay 
Ga emrtrecal Level for | 

Let d = sup, H(x) - mes) be determined from 
the observations. P(D = d) and P(D = d) are computed using 
(9) and (6) as described above, and (10) is used to put bounds 


eaeene erlkcical level, P(D =—d). 


tee) SUBROUTINE "DISKS" 


Mieeealeutatrons of critical levels as described above 
can be very time consuming, especially as the number of 
observations increases. For this reason, subroutine DISKS 
(Appendix A) was developed to perform these calculations. 
Subroutine DISKS will calculate the critical levels of equa- 
tions (6) and (9) and the bounds on the critical level of D 
as in (10) for most discrete distributions (see Appendix A 
mr Ereste Tetons). Subroutine DISKS was used to calculate 
critical levels for various examples and verified with cal- 
culations of the critical levels made by hand. 

PHUbpaotere leks can be modified slightly to calculate 
iMiemenacteetzZeonoQt a Ceitical region for a test. For example, 
with a sample of size 10, the critical region determined from 


tiewstandard tables for continuous distributions of size .1 


IL 





Comemotoret olievaiues of D greater than .369. By insert- 
ing the value of .369 for d in a modified version of DISKS 
and the hypothesized distribution H, the exact size of the 
test when H is discontinuous (which we know is less than .1) 


ean be calculated. 
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twee eo tRiBUuNONS OF TEST STATISTICS 





A. ANALYTICAL DISTRIBUTIONS 


iiesacvmprorlc distributions of D , ee and D have been 
studied by several people for the case when H is not continu- 
ous. Schmid / 8 / showed that the limiting distributions of 
Ds Dp and D do exist, but are no longer independent of H. 
The limiting distributions depend on the values of H at the 
discontinuity points. Schmid showed, for example, that if 


— 


ae G1LSCONGEmMOolG, at xo= Xa 5 


and f = 1, then 


H(x.) ee ne Ab 


ay 


lim P(D < ) = Gli) where 
er 


ns ovo 


G(k) = 3 ay i(era72)o Jo Ja, 





ieee 
Ae 
exp | - = > 2 smn Xx, | dx, 11 0X5, 
j,m=1 
ee ai +1 Pe 4 ue — es a 
AA care . f.)(f, _ f. *) Jogrt Jobe f, == 
as 5 et Of Ml Or i>jrl 
AGE 
oe = 82a ) TT (f, - 7, 7 ie 
jal 


1 


i Ln eee ome uiea om 0) a ee 





A. = -K<x_, + + 3 = 
s U {Xp ja + 2K(P; je) a 


ao ee ee ee eee) =), j> Lee 
23 ( j 23) Jet i 


Unfortunately, G(k) becomes undefined when H is discrete 

since the a's blow-up and b becomes zero. Conover / 3_/ 

ered ~ sas did this author, using the distributions of Section II 
to derive the asymptotic distributions, but the attempts were 
unsuccessful. For these reasons, a computer routine using 
subroutine DISKS was used to investigate the asymptotic pro- 
Pemrices of the distributions of D , D’, and D. Since formulations 
iaeune literature of the limiting distributions involve multi- 
ples of the inverse of the square root of the sample size, it 

was decided that values of k would be determined such that 


lim P(D= KL 


7 a for various values of a. The asymptotic 
n 


r= oD 
distributions of D’ and D” were not studied since they display 
nmfewcame Dasie Characteristics as the asymptotic distribution 


OL); 


B. COMPUTER PROGRAM USED 

Subroutine DISKS was modified to search for the value of 
k such that Pipex ) was as close to, but always less than, 
a predetermined value of « as possible. Values of n between 


thirty and one hundred in increments of five were used to 
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_— 


2 ) = a from (10). Values of n 


determine k such that P(D = —- 
vn 

between eighty-five and one hundred were sometimes not used 

Since significant errors in calculations occurred, even with 

double precision calculations. 

The modified subroutine was used to investigate the 
asymptotic distribution of D when H was one of the following 
distributions: 


1. Discrete uniform with parameter m: 


Oy jt xe 1 


HC.) = =. KS x<ktl, k = 1,2,..., ml 


Ji xX =m 
2. Poisson with parameter ft 


(x) “4k 
hie. — > 7 ia » where [xl = largest integer <x 
k=0 


3. Geometric with parameter p 


[x] 


He) = > p(l-p see 


k=] 


Each distribution was investigated for various values of its 
respective parameter. The values of k determined for the 
Zeousevaives Ol mM tor each particular parametric distributor 
were examined to determine if they appeared to be converging 
ionconmenconmon value. The fact that the distribution eof Das 
discrete suggested that the values of k would not converge in 


a uniform manner to some value, but it was hoped that, even 


eu 





though it jumped around some, the convergence to a common 

value would be evident. By varying the values of the para- 
meters of the various distributions, these discrete distribu- 
tions would approach (in the weak convergence sense) a continu- 
Momdiovrripurron and the limiting value of k should approach 

the known limiting values of k for continuous distributions. 

For example, as m in the discrete uniform distribution increased, 
H has smaller and smaller jumps at each mass point and becomes 
"smoother" looking. I1f we think of the mass points being evenly 
distributed between zero and one, then, as the number of mass 
points increases, H behaves in most respects more and more like 
a continuous uniform distribution function between zero and one. 
Similarly, as the parameter of the Poisson gets larger and 
larger and as the parameter of the geometric gets smaller and 
smaller, these hypothesized Cum vauive distribution funcvions 
have smaller and smaller jumps at their points of discontinuity 
and the distribution functions get smoother and smoother. 

Since the usual K-S test is conservative when H is discrete, 

the approximating values of k for the discrete case should be 
always smaller than these known limiting values of k for the 


COntammous case. 


CG. RESULTS 


For each parametric distribution considered, as n increased, 


the sequence of values of k did appear to converge Ue Telaenuicd el ; 


Oe 





as anticipated, not monotonically. Typical example values 


of k determined for various values of n are tabulated below: 


nQ k 

30 1.095 
o> fle isis 
40 Neeley: 
45 Takei 
50 ie koa 
DD 1.146 
60 allies 2 
65 Taide: 
70 1.165 
es Naess 
80 1.148 
90 Dea litare 


These values of k were determined for the discrete uniform 
ieviwouclon with 10 mass »poants and aq = .05. The variation 
in k as n increases is apparent, but the value of k does appear 
to be fairly constant for n greater then 50. As the parameters 
of the three distributions were changed and the discrete dis- 
tributions became "smoother" looking as described in Section III. 
B, the variation in k became less than that in the table above. 
In each parametric case that was examined, the values of k for 
n> 50 rarely varied from each other more than .03 as in the 
above example. The general tendency was for k to increase as 

n increased and then become relatively stable for n>50. For 
n=>50, the smallest value k thus obtained was recorded and then 
all the values of k for the various values of the parameters 

Of Gach distribuuion were plotted. Figures 1, 2, and 3 show 

a smooth curve approximation through the plotted k values for 
the three distributions with dotted lines representing the 


asymptotic value of k for the continuous case. 
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Figure 1 shows the values of k for the discrete uniform 
distribution for various numbers of mass points. The conserva- 
tiveness of the continuous K-S test is readily apparent from 
this plot. For example, with twenty mass points the asymptotic 
mTproximatton is 1,16 while in the refular K-S test the 
asymptotic value of k is 1.36. As the number of mass points 
increases, the value of k is increasing toward the continuous 
K-S value. One of the surprising results is how slowly k 
converges to the continuous K-S value. Even with two hundred 
mass points at a= .05, k = 1.30, which differs from 1.36 by 
an amount larger than expected. 

Figure 2 depicts the values of k for the Poisson distri- 
bution with various values of the parameter. The curves have 
the same general appearance as those in Figure 1 and the same 
comments made AL oute the discrete uniform apply here. 

Values of k determined for the geometric distribution 
with various values of the parameter are plotted in Figure 3. 
The curves here are similar to the two preceeding distributions 
with the apparent convergence of the value of k to the continu- 
ous K-S value of k as the parameter decreases. With this 


Siteomrenocinicatlon, all sot the previous comments apply here. 
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IV. SUMMARY AND CONCLUSIONS 


1, The K-S test using the standard tabled critical values 
is conservative when the hypothesized distribution, H, is 
discrete. The test is sometimes substantially conservative 
as indicated in Figures 1, 2, and 3. The power of the test 
1s reduced when the test is conservative and, therefore, it 
is desirable to know the exact size of a test instead of a 
conservative estimate. 

2. Conover's procedure can be used to obtain exact (approx- 
imate in the two-sided case) critical levels for a K-S test when 
H is discontinuous or when the data have been grouped. The 
procedure can also be used to find the exact amount of conser- 
vatism of a K-S test if the standard tables are used. The 
only drawbacks to the procedure are the lengthy and tedious 
calculations required. 

3. Subroutine DISKS was developed and tested to calculate 
the critical levels in Conover's procedure for many discrete 
dystrioutaons. 

4, As the sample size increases, the limiting distribu- 
ioiemotetne tect statistics D, D , and ia for discon binueue 
Pee cicp pnt ol tie closed form limiting distributions 
investigated, they are degenerate when H is discrete. Sub- 
routine DISKS may be modified slightly to obtain an approxi- 


— 


: ae oe — K 
Matron to the limiting values of k such that P(D = F) eae 


for any 0 =a = 1. 
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5. The limiting values of k above were approximated as 
described for three distribution families. Asn increased, 

k had a general tendency to increase and become fairly constant 
for n> 50. As the parameter of each family changed such that 

H had smaller jumps at mass points and become "smoother" looking, 
k approached the limiting value of k found in the standard 

K-S tables. Significantly, this convergence of k to the limit- 
ing value for the continuous case was much slower than antici- 
pated. 

6. Figures 1, 2, amd 3 indicate that each family of 
distributions has distinctive sets of similar curves. Further 
investigation seems warranted to attempt to find an easy and 
quick means to modify the existing K-S tables for use ina 
K-S test when H is discrete. This would involve determining, 
for each family of discrete distributions, a function depending 
onn, a , and the parameters of the family that would modify 
the critical values in the standard K-S tables for continuous 
ino CCimecsiawcliines Tomeuhal Particular family of distribu- 


Talons 5 
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APPENDIX A 


i UsbeOrSsUShOULINE DISKS 


A. PURPOSE OF SUBROUTINE 
Subroutine DISKS uses Conover's / 3_/ procedure to compute 
the critical level, (the probability of getting a value of the 


test statistic as large as the observed value when H F(x) 


QO 
ice eoreallecwtce true of a Kolmogorov soodness-of-fit 
test when the hypothesized distribution is discrete. If Sy 
is the cumulative empirical distribution of the sample, then 
the following test statistics are used for the specified 
alternative hypothesis: (1) alternatives of the type F =H 
fe) = sup. Pieg) = ce) fee) alternatives of the type 
PeoH use D = suP,, (He oe) While (3) alteraatives or 
the type F H use p= sup, (Sa) N(x )08) For a given hypethe— 
sized distribution and sample of the distribution to be tested 
the subroutine determines the observed values of D, D , and iD 
If these observed values are d, d, and a’, respectively, then 
the subroutine computes the double precision quantities PDMNs, 
PDPLS, PDL, and PD where: 

PDMNS = Prob(D =d ) 

PDPLS = Prob(D' = d’) 


Pin Probl) = dja = PD 
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Peeve Oo SsUeR OUTING 


Pe ve = 1 
If all of the possible mass points of the hypothesized 
distribution are represented in the data, then ITYPE = 1 and 
the following quantities must be provided: 
X -- N-dimensional vector containing the sample 
H -- (M+1)-dimensional vector containing the values 
of the hypothesized cumulative distribution 
M -- the number of distinct data points 
N -- the total number of data points, less than 
Ormegia ll [omth iictyel 30) 
S -- a dummy vector of length (M+1) 
Zee Wek 2 
If all of the possible mass points of the hypothesized 
@istribution are not represented, then ITYPE = 2 and the above 
input is modified by making X a dummy vector and 8 a vector of 


the values of the cumulative empirical distribution. 


C. LIMITATIONS 


The only limitation to the subroutine is that N be less 
than or equal to thirty (30). For N larger than thirty C30) 
the user need only modify the second and third dimension 
statements of the program by changing 30 to the number desired. 
The user should be cautioned that, as N gets large (about one 
hundred (100)), the nature of the calculations causes signifi- 


cant errors to propagate even with double precision calculations. 


Dee 





De Ti AND CORE REQUIREMENTS 


All of the times and core requirements that follow are 
based on runs of DISKS at W. R. Church Computer Center, Naval 
Postgraduate School, Monterey, California on an IBM Boo, ere 
The subroutine requires approximately 11K of core for storage 
and 6.5 seconds to compile. Execution time is approximately 
-4 seconds for N = 10, .5 seconds for N = 20 and .55 seconds 


nhoryN = 30, 


ee) ¥ GREFUCATION 

Fifteen examples were used to verify that subroutine DISKS 
calculated the desired quantities correctly. In each example, 
the calculations were performed by hand-calculations using 
Conover’s procedure and then compared with the computer-calcu- 
lated values. Examples were formulated to exercise each "if" 
statement and each branching point in the subroutine at various 
levels of M and N. The following are three examples used in 
the verification process and are listed here to indicate the 
general types of examples used: 

1. This is example 1 from Conover / 3_/. Let H be the 
G@ilcenete wiaitomn disbhributicom with 5 mass points on the inte— 
gers 1, 2, 3, 4, 5. Suppose a random sample of size 10 with 
(cm@emem) Values 1,91, 1, 2, 2, 2, 3, 3, 3, 3 1s drawn from 
Some population. Hand-calculation shows d = 0.0, d = .4, 
AnGed = giv el ding: 

PQs sare) — 170 
a) 


PD ,02081 


| 


One oe P(D = d) = 0.04162 
ee 








Siiemeoutrne DESKS yielded: 


2. 


PDMNS = 1.0 
PDPLS = 0.020809 
PDL = ,041184 , PD = ,041617 


This example is from Darmosiswoys / 5_/, page 24, 


feeias Mass pOllmts 1, 2, and 3 such that P(X = 1) = .3624, 


P(X = 2) 
an expon 
by X= 1 
ie Y = 9 
has been 
A random 
2 ee PRS 


ea leu lak 


we lcwmmeencd P(x = 3) = .2209 (X is a function of 
ential random variable, Y, with parameter 6.0 defined 
cme = eo 2 fee 2 lt 2.7 =—Y = 9-09, and %*¥ — 3 
oo) ence temanmescampleomenow to handle data thas 
grouped and the original sample cannot be recovered. 
Sample of Size 15 with values 1, 2, 3, 2, 3, 3, 1, 1, 
Jee oe oes drawn trom some population.  Hand= 
ion yielded: 


re 0G. =e d) = 0.0557 


Subroutine DISKS yielded: 


foe = 0.055174 , PD = 0.055817 


Cee nis emamnpite illustrates how to handle discrete dis- 


tributions with a countable number of mass points. Let H be 


whe. Pows 
random s 
ieee as 


yielded: 


Son distribution with parameter 0.7. Suppose a 
PupitemomciverlOevith values 1, 3, 2, 1, 0, ls Oy 2: 


drawn from some population. Hand-calculations 


i} 


P(D =a) 014774 


Pip =d') 


\] 


0.84238 


OnOr ioe = Pt) = d) 0.02386. 
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Since the number of distinct mass points is infinite, some 
value of M must be decided upon to use in the program. H is 
truncated such that all the probability associated with mass 
points beyond the (M+1)8* mass point is assigned to the 


(M+1) St 


mass point with a corresponding grouping of sample 
Gata of necessary. With M= 4, ITYPE = 1 and P(X=>3) = 1-H(3) 


= ,0291 is added to P(X = 3). In this case, DISKS yielded: 


PDMNS = 0.014768 
PDRES >=) 130 
lle —Bon2 ote see) = 0.023277 


Wetn iW = 6, ITYPE = 2 and P(X=5) = 1-H(5) = 0.0001 to four 
decimal places. In this case, DISKS yielded: 

PDMNS = 0.014772 

PDPLS 


II 


One 2 ode 


PDL 0.023156 , PD = 0.02382 

Mie actual hypothesized distribution is a truncated distri- 
bie lon, but, 2h the probability of all the mass points beyond 
the (M+1)°° mass points is relatively small, as in the above 
case with M = 6, the critical levels calculated by DISKS are 


Very £000 approximations to the critical levels of the untrun- 


cated hypothesized distribution. 
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